Sorting Hat Research Articles

Each spring, at high noon on a Friday in mid-March, thousands of soon-to-be physicians gather with mentors and faculty to have their medical futures revealed before classmates, family, and friends. The event is the culmination of what has been an elaborate, 5-month courtship ritual. The atmosphere is a complex mix of excitement, hope, and fear. The scene is oddly reminiscent of the Sorting Hat from Harry Potter.Participation entails an extraordinary act of faith by all involved: tens of thousands of Type A individuals abdicate control of their future—not to a bit of Hogwarts magic, but to a computer matching program. They are willing to do so because they trust that the computer will render fair outcomes. More darkly, one might say that they are coerced—they participate because the National Resident Match Program effectively has a monopoly on entrance into the graduate medical education system. Regardless, to the extent that participants have faith in the software, it is likely well placed—the algorithm is brilliantly designed to optimize outcomes. Its creator, Alvin Roth, was later awarded the Nobel Prize in economics for this and other work with matching algorithms, most notably for organ transplantation.The Match predated what has now become a broad societal trend to embrace big data and the power of computational approaches. We happily accept Google and Amazon's recommendations for what to read or purchase. As physicians, we are excited by the potential to apply complex data analytics to various medical problems.1 For all of our success, though, we sometimes forget that a central tenet of these approaches is “garbage-in, garbage out” or, in the words of a Harvard Business Review commentary, “If Your Data Is Bad, Your Machine Learning Tools Are Useless.”2 In this regard, the Match may be on shaky footing.In this issue of the Journal of Graduate Medical Education, Hartman et al3 explore the factors that programs typically use to build their rank order lists—the data in. The authors describe some of the problems that arise with the status quo. These problems can include that United States Medical Licensing Examination (USMLE) Step 1 scores, despite being one of the most commonly used metrics to assess applicants, are not good predictors of resident performance; Medical Student Performance Evaluations (MSPE) may obscure comparative data or even suppress important negative information (a devastating betrayal when discovered post-hoc); and applicant personal statements and traditional letters of recommendation may be of limited utility (exclusive of the issue that these documents are plagiarized at surprisingly high rates).4Unfortunately, this may be just the tip of the iceberg. It's not just that these metrics are poor discriminators of future performance; various studies have shown that they may compound group differences—if not overt biases—relating to applicant gender and/or race. USMLE scores show approximately a 1 standard deviation difference based on race, perhaps related to differences in socioeconomic backgrounds (as has been shown with SAT scores).5–8 Students from groups underrepresented in medicine may receive lower clerkship grades.8,9 Women may be described differently in letters of recommendation.10–13 Applicants are described differently in their MSPEs based on race and gender, with white applicants disproportionately described as “excellent,” “outstanding,” and “best,” and black applicants disproportionately described as “competent.”5 Black students are approximately 6 times less likely than other students to be inducted into the Alpha Omega Alpha (AOA) honor society.14 Even when looking at students from the same school and with identical grades in all core clerkships, black students were still 3 times less likely than non-black students to be inducted into AOA.8 This disparity persisted even after controlling for USMLE scores.8These findings are deeply troubling for our field. They demonstrate that a program that bases decisions on ostensibly “objective” data may actually propagate implicit racial and gender-based biases that are already embedded in the system. Which is to say, it's not just garbage; it's racist and misogynistic garbage.And yet, as Hartman et al3 describe, we use these data to build our rank lists because they are the most accessible and most readily quantifiable. By default, they may also serve as surrogate indicators of success. And here the problem only gets worse. There is a principle in economics called Goodhart's law. Loosely described, it says that when an organization implements a new outcome metric, employees will alter their behavior to inflate performance on that metric—even if doing so may undermine the health and productivity of the organization. For example, if a hospital starts tracking length of stay as a key outcome, patients may be discharged more rapidly and potentially prematurely, thereby causing more readmissions or other adverse events. For residency programs, what begins as a matter of convenience can become a self-propagating problem: fetishizing the wrong type of data may cause us to select the wrong applicants.Of course, the most obvious way that residency programs may judge success in the Match is by how low they go on their rank list to fill all positions. This is also the metric that quintessentially embodies Goodhart's law: if a program is invested in filling a class from within a certain range on their list, they may change the way they rank applicants and favor individuals they believe are more likely to matriculate over those who may be better applicants but have not disclosed interest. It must be said that such behavior directly undermines the purpose—and elegance—of the Match algorithm.This leads us to perhaps the greatest problem relating to the Match: that programs are forced to create a single rank order list. The mere act of creating such a list implies that applicants can be arranged on a single continuous scale, from best to worst. As program directors, we might like to believe that we can reduce all of the data down to a list that roughly correlates—if only probabilistically—with how individuals will perform during training. But even if we could use these metrics to identify the “best applicants” who will go on to be the “best residents” (whatever that means)—Is that really what we care about the most?Many programs pride themselves on having distinct missions beyond training excellent clinicians. These may include addressing the needs of underserved communities, developing leaders in research, or training public policy advocates. The attributes that will predict an applicant's future success in these regards may be independent of—or perhaps even stand in conflict with—their performance on traditional metrics.Embracing the multidimensionality of applicants requires an act of courage: it may entail selecting individuals who score lower on traditional metrics with the confidence that their strengths and future potential outweigh the risks. For example, it may mean ranking highly someone with a history of USMLE Step 1 failure but exceptional commitment to working with underserved communities. Or it may mean ranking someone with superb research credentials but worse clinical performance in medical school. In both cases, the program may expect that the resident will struggle in the short term but still believe that they can persevere and have greater impact in the end. From a game theory perspective, making these choices is a losing bet: the conventional candidate may be more likely to succeed and, even if they struggle, no one will question the decision to accept them; if the unconventional candidate struggles, complaints and second-guessing will abound—and, even if they succeed, the positive payoff may be so remote from training as to appear irrelevant.Embracing a more holistic perspective may also alleviate some of the intrinsic stress of the Match. The more we hone in on our unique goals, the more we may recognize that we are competing less with perceived rival programs than we think. For example, when I debrief with colleagues at the end of each recruitment season, I am constantly reminded: we're looking for different things in applicants. Moreover, whether we successfully recruit a talented new class may have less to do with our actions or our perceived program quality so much as the state of the field as a whole. In a good year, with many strong applicants applying in a particular specialty, all programs will do relatively well; conversely, in an “off” year, all programs will struggle together.This leads to a critical point: if we view the Match as a zero-sum game, with other programs as adversaries, then we allow ourselves to be divided. As the field of medicine faces more and more external threats (eg, with evolving systems of care), our real goal ought to be to better ally and advocate together. Especially apropos to this point is the suggestion that pressure to score well on the USMLE may be a significant contributor to burnout, depression, and suicidality in medical students. Is it worth it to us to force students to struggle for a score that has little or no predictive value in residency?When the students of Hogwarts step up to be sorted, the Sorting Hat looks deep into their character to determine the best fit. With rare exceptions, all of the students are qualified and all are expected to succeed. Each will achieve basic competencies and have the opportunity to thrive in their own unique way. By in large, the same is true with the Match. The fact that a student is graduating from medical school is proof of a level of skill and accomplishment. While we tend to focus on the computer part of the Match, it can never be better than its inputs—the data that we use to generate our rank lists. There will always be a role for traditional metrics, but we would be wise to temper our enthusiasm for them and embrace a more holistic and nuanced approach. We should stop thinking and talking about “best applicants” and focus more on how we can identify those who best align with each of our program's unique missions. And maybe try for a little magic.

Back to table of contents Previous article Next article Book ReviewsFull AccessThe Cult of Personality: How Personality Tests Are Leading Us to Miseducate Our Children, Mismanage Our Companies, and Misunderstand OurselvesScott E. Provost, M.M., M.S.W.Scott E. ProvostSearch for more papers by this author, M.M., M.S.W.Published Online:1 Feb 2006https://doi.org/10.1176/appi.ps.57.2.280AboutSectionsView EPUB ToolsAdd to favoritesDownload CitationsTrack Citations ShareShare onFacebookTwitterLinked InEmail The use of personality tests is ubiquitous in contemporary society; the personality testing industry is a $400 million industry (1). Personality tests have been used by both large and small institutions, including schools, corporations, and hospitals, to sort, classify, categorize, and assign diagnoses to people. References to personality tests and other forms of institutional control have even made their way into books and movies. An example is how the legendary wizard Harry Potter was placed into the Gryffindor House at the Hogwarts School of Witchcraft and Wizardry by a "sorting hat" that could gauge the temperament of each student (2). Professionals who work in the mental health and addiction fields are likely to be familiar with some of the types of personality tests applied in child custody hearings, forensic and other clinical diagnostic evaluations, and screening assessments in schools.Personality tests are often considered benign. Until now there has been a dearth of work examining the cultural history of personality tests, the development of the tests, and how the tests are administered and regulated. Annie Murphy Paul, the author of The Cult of Personality: How Personality Tests Are Leading Us to Miseducate Our Children, Mismanage Our Companies, and Misunderstand Ourselves, is uniquely equipped to explore such issues on the basis of her experience as a journalist covering mental health and psychology for a variety of mainstream publications. Although the intended audience is a general readership, the book will be useful for a variety of professionals, including managers, psychologists, and other mental health professionals and educators.The book is well written and traces the cultural history behind the development of many of the most widely known personality tests, including the Rorschach inkblot test, the Minnesota Multiphasic Personality Inventory, the Thematic Appreciation Test, and the Myers-Briggs Type Indicator. In addition, it touches on some of the emerging trends in personality testing, such as functional magnetic resonance imaging (fMRI) and computer-enabled personality tests. Throughout the book, the author critically evaluates the uses, misuses, strengths, and weaknesses of each of the tests specifically, as well as with personality tests in general.Some of the author's conclusions are particularly startling, including the fact that the personality testing industry is, for the most part, unregulated and that many personality tests are administered by untrained and unqualified personnel. However, perhaps the most striking conclusion is that personality tests are overly reductionistic and neglect to account for the context, situation, and environment in which an individual lives and works. Herein lies the danger to personality tests: "People are too erratic and complex to be so pigeonholed" (3) by tests that reduce complex personality traits into narrow, one-dimensional labels. Although this conclusion should not surprise most Psychiatric Services readers, considering the pervasiveness of the biopsychosocial approach to the diagnosis and treatment of psychiatric disorders, it should sound an alarm for anyone who administers personality tests, anyone who interprets the results of personality tests in clinical decision making, and, most important, those who are subjected to a personality test. Most, if not all, ordinary individuals who are subjected to personality tests either as a condition of employment or as mandated by court order are powerless to protect themselves from the damage of being condemned to a one-dimensional label. Despite the evidence that many personality tests lack reliability and validity, they are unlikely to disappear from use in corporations, courts, schools, and other institutions in the near future. The take-home point, therefore, is caveat emptor (buyer beware).Mr. Provost is affiliated with the alcohol and drug abuse treatment program at McLean Hospital in Belmont, Massachusetts.by Annie Murphy Paul; New York, Free Press, 2004, 320 pages, $26

Sorting Hat Research Articles

Articles published on Sorting Hat

“Sorting hat” flap as a modification of the classic A-T flap

Looking Beyond the Sorting Hat: Deconstructing the “Five Factor Model” of Alienation

Harry’s Mirror: Desire, fantasy and the Mirror of Erised in Harry Potter and the Philosopher’s Stone

Sorting Through “New and Improved” Versions of J. K. Rowling’s Sorting Hat

Ranking the synthesizability of hypothetical zeolites with the sorting hat.

Putting the Sorting Hat on J.K. Rowling’s Reader: A digital inquiry into the age of the implied readership of the Harry Potter series

The Sorting Hat: Cool Fiction Element but Not Necessarily a Good Career Advisor

Why we put on the sorting hat: motivations to take fan personality tests

PAAR proteins act as the ‘sorting hat’ of the type VI secretion system

The Match: Magic Versus Machines.

The Science Behind the Magic? The Relation of the Harry Potter “Sorting Hat Quiz” to Personality and Human Values

Virtual sorting hat™ technology for the matching of candidates to residency training programs.

Admitting the Patient With Acute Stroke to the Right House-Lessons From the Sorting Hat of Hogwarts.

The Darker Side of the Sorting Hat: Representations of Educational Testing in Dystopian Young Adult Fiction

The Sorting Hat Goes to College

A supramolecular sorting hat: stereocontrol in metal-ligand self-assembly by complementary hydrogen bonding.

A Supramolecular Sorting Hat: Stereocontrol in Metal–Ligand Self‐Assembly by Complementary Hydrogen Bonding

Individual Predisposition for Learning and Neuroplasticity

Acetylation Deficiency and the Germinal Center of Doom: Lymphoma and the Sorting HAT

The Cult of Personality: How Personality Tests Are Leading Us to Miseducate Our Children, Mismanage Our Companies, and Misunderstand Ourselves

Lead the way for us